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In  the  context  of  corrosion  engineer- 
ing it  is  often  natural  to  be  concerned 
with  extreme  events.  This  is  because, 
firstly,  it  is  these  extreme  events  that 
often  lead  to  failure  and,  secondly,  it 
may  only  be  possible  to  measure  the 
extremes,  with  much  of  the  underlying 
measurements  by  their  very  nature  un- 
observable. Statistical  methods  relating 
to  extreme  value  theory  can  be  used  to 
model  and  predict  the  statistical  be- 
haviour of  extremes  such  as  the  largest 
pit,  thinnest  wall,  maximum  penetration 
or  similar  assessment  of  a corrosion 
phenomenon.  These  techniques  can  be 
applied  to  the  single  largest  value,  or 
to  a given  number  of  the  largest  values, 


measured  over  individual  areas  or  cou- 
pons; or  to  all  values  exceeding  a given 
threshold.  The  data  can  be  modeled  to 
account  for  dependence  on  environ- 
mental conditions,  surface  area  exam- 
ined, and  the  duration  of  exposure  or 
of  experimentation.  The  application  of 
a selection  of  these  techniques  is 
demonstrated  on  data  from  industry 
and  from  laboratory  experiments. 
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1.  Introduction 

Extremes  are  typically  defined  in  two  ways.  Ei- 
ther by  selecting  a suitable  threshold  and  then 
recording  every  observation  above  that  threshold; 
or  by  sorting  the  data,  according  to  some  a priori 
sampling  scheme,  so  as  to  select  the  one,  two,  or 
three,  etc.,  largest  value(s).  The  nature  by  which 
the  extremes  are  defined  and  hence  measured  is 
then  indicative  of  the  techniques  appropriate  for 
modeling  and  prediction.  Most  of  the  statistical 
methods  relating  to  extreme  values  are  based,  in 
the  first  instance,  on  the  assumption  of  an  underly- 
ing large  sample  of  possible  measurements,  all 
nominally  arising  from  a single  population  of  such 
possible  measurements.  For  extreme  value  theory 
to  be  used,  it  is  then  only  necessary  for  the  actual 
extremes  to  be  measured.  The  other  possible  mea- 
surements can  be  ignored  and  may  even  be  unob- 


servable with  the  equipment  used  to  measure  the 
extremes.  The  nature  of  the  extreme  may  be  that 
of  a maximum  value  or  a minimum  value.  In  this 
paper  we  will  assume  that  maximum  values  are  of 
interest.  In  applications  concerned  with  minima, 
negating  the  variable  of  interest  will  transform  the 
problem  into  one  concerned  with  maxima. 

The  generalized  Pareto  distribution  (GPD)  is 
the  standard  family  of  statistical  distributions  to  be 
used  as  a basis  for  modeling  data  which  arise  as 
exceedances  over  some  threshold.  Applications  of 
this  approach  for  the  first  of  the  above  extreme 
value  definitions  is  examined  in  the  following  sec- 
tion. Methods  to  ensure  the  validity  of  the  standard 
statistical  assumptions  while  accumulating  such 
data  are  discussed.  The  generalized  extreme  value 
(GEV)  distribution  can  be  shown  to  be  the  natural 
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one  to  use  for  single  extremes.  Data  can  arise  as 
the  largest  value  from  each  of  a set  of  coupons  (in- 
dividual specimens),  or  from  partitioning  an  area 
into  equal  smaller  areas  and  selecting  one  maxi- 
mum from  each  smaller  area.  The  application  of 
methods  considering  such  single  extremes  is  also 
considered.  The  joint  generalized  extreme  value 
distribution  (JGEV)  is  the  appropriate  distribution 
family  to  use  when  the  r (say)  largest  values  are 
extracted,  instead  of  just  the  single  largest  value. 
This  provides  a useful  extension  to  the  classical 
theory  in  such  a way  as  to  match  up  with  the  com- 
mon practice  of  measuring  the  few  largest  pits  at 
any  one  location  undergoing  pitting.  Using  the  r 
extreme  order  statistics  in  this  way  can  increase  the 
precision  of  the  estimates  in  the  model  and  hence 
improve  predictions. 

Dependence  on  time  and  area  can  be  incorpo- 
rated for  prediction  and  extrapolation  purposes 
when  applying  these  distributions,  and  methods  for 
modeling  the  dependence  on  environmental  condi- 
tions, say,  through  covariates  are  indicated. 

2.  Exceedances  Above  a Threshold 

These  are  data  collected  on  the  basis  of  all  val- 
ues exceeding  a specified  threshold,  taken  suffi- 
ciently “high”  to  imply  that  certain  limiting 
statistical  results  will  hold.  The  data  in  Table  1,  on 
pit  depths  in  two  stainless  steel  roofs,  were  col- 
lected with  just  such  a threshold,  namely  6 |xm,  in 
operation.  This  threshold  qualifies  as  “high”  on  the 
basis  that  a much  lower  one,  such  as  0.06  |xm  for 
example,  would  have  produced  a very  much  larger 
sample  of  nascent  pits.  This  is  consistent  with  theo- 
ries of  pitting  in  steel  and  other  metals.  See  further 
argument  supporting  this  approach  in  Ref.  [1].  This 
type  of  data  censoring  can  arise  through  built  in 
limits  on  measurement  capabilities  or  else  through 
deliberate  censoring  of  a given  data  set,  typically  a 
dense  time  series,  so  as  to  isolate  the  important 

Table  1.  Pit  depths  above  6 in  stainless  steel  sheet  college 
roofs  (area  500  nr;  samples  10  cm2;  thickness  400  jem) 


Roof  1 (50  months) 

131  106  35  26  26  25  23  20  20  18  18  18  17  16  16  15  15  15  14  14 
14  14  14  14  14  14  14  12  12  12  12  12  10  10  8 8 8 8 8 8 8 8 8 

Roof  2 (29  months) 

140  106  95  77  72  55  55  53  52  36  33  32  32  30  28  28  26  26  25  24 
24  24  22  22  20  18  18  16  16  16  16  14  14  12  12  12  8 8 8 


events.  When  such  data  are  extracted  from  a regu- 
lar grid  of  values  rather  than  through  the  engineer 
visually  identifying  isolated  corrosion  phenomena 
and  taking  one  measurement  on  each,  it  may  be 
necessary  to  edit  the  values  so  as  to  extract  only 
local  cluster  maxima  rather  than  using  all  nearby 
points.  This  is  needed  to  “decouple”  the  recorded 
values  and  so  validate  the  usual  assumption  of 
statistical  independence  or  exchangeability.  A care- 
ful combination  of  grid  size  (to  match  the  scale  of 
the  phenomena  being  studied)  and  threshold  (to 
select  for  significant  phenomena)  may  be  all  that  is 
necessary. 

With  this  form  of  data  set,  both  the  number,  n, 
of  observations  and  their  observed  values  {y,}  are 
necessarily  random  variables.  It  can  be  shown,  see 
for  example  Ref.  [2],  that,  for  sufficiently  high 
thresholds,  and  for  a wide  variety  of  initial  distribu- 
tions, this  number,  n,  of  the  exceedances,  has 
asymptotically  a Poisson  distribution  (with  parame- 
ter A,  say)  and  their  sizes,  y,  have  a generalized 
Pareto  distribution; 

G(y)  = \-(\+&/ayl\  (1) 

valid  for  1 -I-  iylcr  > 0,  with  cr  > 0 and  - oo  < £ < <x> . 
In  particular,  if  these  distributional  results  hold  ex- 
actly for  some  particular  threshold,  u say,  then  the 
maximum  of  this  set  of  values  has  a generalized 
extreme  value  distribution  (see  next  section)  ex- 
actly, and  this  will  be  true  for  all  higher  thresholds. 
A check  that  the  distribution,  Eq.  (1),  holds  can  be 
made  by  graphing  the  mean  excess  plot,  in  which 
the  mean  exceedances  in  the  data  are  plotted 
against  increasing  threshold  values.  This  plot 
should  follow  a straight  line  with  slope  £/(l-£) 
and  intercept  a/( l-£);  with  a horizontal  plot  cor- 
responding to  £ = 0 and  a simple  exponential  distri- 
bution for  the  tail.  For  extrapolation  over  larger 
areas,  for  extremes  derived  from  random  sampling 
over  a large  structure,  often  the  quantity  of  interest 
is  the  N th  return  level 

qN  = u --  [l-(A7V)f], 

where  N is  either  the  number  of  “coupon  multi- 
ples” as  a measure  of  structure  size,  or  else  the 
number  of  time  intervals  into  the  future.  The  Ath 
return  level  is  interpreted  as  that  level  which  would 
be  exceeded  on  average  once  every  A units  of  area 
(or  time). 
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The  data  in  Fig.  1(a)  are  1024  values  of  “current 
noise”  collected  during  a study  of  the  electrochem- 
ical nature  of  pitting.  This  series  was  “declustered” 
using  a moving  window  of  width  40  to  give  the  iso- 
lated maxima  in  Fig.  1(b).  A mean  excess  plot  for 
the  isolated  maxima  of  the  current  noise  data  is 
given  in  Fig.  1(c).  Consideration  of  this  plot  sug- 
gests that  either  a large  threshold  is  required  or 
that  the  exceedances  arise  from  a mixture  of  the 
tails  of  underlying  distributions.  For  an  electro- 
chemical interpretation  of  this  latter  phenomenon, 
it  can  be  noted  that  large  narrow  current  spikes 
have  been  described  as  being  typical  of  intermit- 
tent pitting  corrosion,  while  steady  broader  based 
but  less  variable  current  noise  has  been  associated 
with  general  corrosion,  see  for  example  Ref.  [3]. 
Intermediate  conditions  can  be  associated  with 
persistent  pitting,  widely  recognized  as  the  most 
threatening  scenario  for  metal  structures. 


Fig.  1(a).  Current  noise  measurements  (sample  size  = 1024). 


Fig.  1(b).  Isolated  peaks  in  current  noise  measurements. 


Fig.  1(c).  Mean  excess  plot  for  current  noise  measurements. 

The  main  difficulty  which  can  arise  with  the 
threshold  method  is  the  choice  of  an  appropriate 
threshold,  especially  when  there  is  no  a priori  rea- 
son for  choosing  one  particular  threshold  over  an- 
other. In  an  experiment  to  consider  the  prediction 
of  extreme  corrosion  rates  for  carbon  steel  in  a 
simulated  basalt  groundwater  [4],  a number  of  200 
mm  x 200  mm  coupons  were  exposed  for  varying 
lengths  of  time.  These  coupons,  having  been  first 
cleaned  to  remove  all  corrosion  products,  were 
profiled  with  spot  heights  taken  at  the  nodes  of  a 1 
mm  lattice.  This  then  gave,  after  making  an  adjust- 
ment for  the  original  coupon  surface,  a 196x196 
array  of  corrosion  measurements.  False-color  his- 
togram-equalization techniques,  displayed  on  com- 
puter monitors,  were  used  to  validate  and  inspect 
the  digitized  spot  heights  from  these  coupons.  A 
mean  excess  plot  for  a typical  coupon  exposed  for 
26  weeks  is  shown  in  Fig.  2(a).  Note  that  this  plot 
was  drawn  for  both  the  raw  exceedances  and  also 
for  declustered  exceedances.  The  process  of 
declustering  essentially  amounted  to  identifying  all 
those  “pits”  or  clusters  exceeding  a particular 
threshold  and  calculating  the  maximum  ex- 
ceedance for  each  “pit.”  The  mean  excess  plot  in- 
dicates that  a range  of  possible  thresholds  (300 
fxm-550  |o,m)  would  be  appropriate  for  model  fit- 
ting. Table  2 gives  the  results  for  such  model  fitting 
using  maximum  likelihood  for  a range  of  values  of 
threshold.  Here  A is  the  mean  exceedance  rate  per 
m2,  cr,  and  £ are  the  parameter  estimates  for  the 
GPD,  and  q2s  and  q2 50  are  those  levels  that  would 
be  exceeded  once  on  average  every  nr  and  every  10 
m2  respectively.  Standard  errors  are  given  in  brack- 
ets. If  the  q25  is  considered,  we  see  that  its  esti- 
mated value  decreases  as  the  threshold  increases, 
its  value  being  highly  sensitive  to  the  value  of  £. 
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Table  2.  Summary  of  model  fitting  and  prediction  using  maximum  likelihood  for  the  generalized  Pareto  distribu- 
tion for  a typical  26  week  basalt  groundwater  coupon  profile 


Threshold 

Mean  cluster 
exceedance 
(g.m) 

Number  of 
clusters 

A 

(7 

<725 

<725(1 

300 

99 

177 

4425 

98.0 

0.01 

1158 

1406 

(333) 

(11) 

(0.08) 

(260) 

(430) 

350 

92 

146 

3650 

99.0 

0.04 

1205 

1500 

(302) 

(11) 

(0.09) 

(300) 

(527) 

400 

97 

96 

2400 

104.3 

-0.08 

1004 

1102 

(245) 

(16) 

(0.11) 

(214) 

(322) 

450 

83 

76 

1900 

83.4 

-0.01 

1057 

1233 

(218) 

(11) 

(0.13) 

(241) 

(405) 

500 

90 

50 

1250 

102.6 

-0.14 

963 

1037 

(177) 

(23) 

(0.17) 

(213) 

(309) 

550 

87 

29 

725 

108.5 

-0.23 

918 

961 

(135) 

(12) 

(0.31) 

(250) 

(339) 

For  higher  thresholds  the  large  negative  value  of  £ 
is  indicative  of  a tail  distribution  which  is  shorter 
than  exponential  so  implying  lower  return  values. 
For  lower  thresholds  the  tail  appears  to  be  expo- 
nential implying  relatively  higher  return  values. 
This  effect  can  be  seen  further  in  an  exponential 
probability  plot  of  the  exceedances  above  300  jxm, 
Fig.  2(b).  As  the  threshold  increases  more  weight  is 
given  to  the  extreme  observations,  which  are  them- 
selves smaller  than  would  be  expected  for  an  expo- 
nential tail.  The  lack  of  an  objective  method  for 
determining  the  correct  threshold  therefore  leads 
to  difficulties  in  prediction. 

120 
100 

| 80 

1 68 

CD 

0 

2 40 

1 20 

E 

0 

200  400  600  800 

threshold/p,m 

Fig.  2(a).  Mean  excess  plot  for  typical  26  week  basalt  ground- 
water  coupon  profile:  O — mean  declustered  exceedances;  □ — 
mean  of  all  exceedances. 


p lot t i ng  pos i t i on 

Fig.  2(b).  Exponential  probability  plot  of  declustered  ex- 
ceedances above  300  jcm. 

3.  Extreme  Value  Distributions 

Data  suitable  for  this  type  of  analysis  can  arise  as 
the  largest  value  from  each  of  a set  of  coupons,  or 
from  dividing  an  area  into  equal  smaller  areas  and 
selecting  one  maximum  from  each  smaller  area, 
provided  the  scale  of  division  and  corrosion  pat- 
terns are  compatible  in  the  sense  described  above 
for  the  generalized  Pareto  distribution.  For  a sam- 
ple of  independent  identically  distributed  random 
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variables,  the  distribution  of  xmm,  the  data 

maximum,  depends  on  n.  Suppose  however  that 
there  exist  location  and  scale  factors,  an  and  b„  say, 
so  that  the  rescaled  variate,  y =a„  + b,X(n),  has  a dis- 
tribution which  is  independent  of  n.  This  is  the  so- 
called  “stability  postulate,”  and  leads  immediately 
to  the  following  functional  equation  (to  be  solved 
for  F ):  F(x)n-F(a„  +b,x).  The  solution  to  this 
equation  is  the  generalized  extreme  value  (GEV) 
distribution,  which  can  be  written  in  the  following 
3-parameter  form: 

F(x ) = exp{  — [1  + — ii  )/i/d  ” 1/f}, 

£x> #>0.  (2) 

See  for  example  Ref.  [5].  Note  also  that  if  the  as- 
sumption of  independence  is  relaxed,  under  gen- 
eral conditions  the  distribution,  Eq.  (2),  is  still  the 
appropriate  one  for  maxima.  It  turns  out  that  al- 
most all  standard  distributions  satisfy  the  stability 
postulate  asymptotically,  although  it  is  only  exactly 
true  for  the  GEV  distribution  itself.  This  is  exactly 
analogous  to  the  Central  Limit  Theorem  for  aver- 
ages, which  is  satisfied  asymptotically  by  almost  all 
standard  distributions,  but  only  holds  exactly  for  an 
initial  Normal  distribution.  As  with  averages,  which 
are  assumed  Normal,  by  the  Central  Limit  Theo- 
rem, and  then  fitted  accordingly,  so  with  maxima,  it 
is  reasonable  to  assume  a GEV  distribution  and  fit 
accordingly.  Since  the  dependence  of  the  stability 
coefficients,  a„,  b„,  on  n is  typically  logarithmic,  or 
slower,  we  can  extract  maxima  from  samples  which 
are  roughly  the  same  size.  In  engineering  practice 
this  is  often  almost  unverifiable,  but  nevertheless  a 
plausible  assumption,  since  the  bulk  of  the  data, 
“too  small  to  be  seen,”  may  be  uncounted,  let 
alone  observed.  The  physical  size  of  components 
and  common  conditions  may  be  the  only  justifica- 
tion. 

For  extrapolation  over  larger  areas  (for  extremes 
derived  from  random  sampling  over  a large  struc- 
ture) or  over  longer  time  periods  (for  extremes 
derived  from  sampling  at  regular  intervals  of  time), 
the  vVth  return  level  can  be  defined  by  solving 
F(x)  = l — 1/N.  Again  N is  interpreted  as  in  the 
previous  section.  Alternatively,  after  fitting  the  dis- 
tribution to  the  given  data,  the  implied  distribution 
of  extreme  values  from  future  samples  over  larger 
areas  and  longer  lengths  of  time  (with  equal  base 
populations)  can  be  deduced  and  properties  such 
as  the  mean  extreme,  etc.,  inferred  from  this  more 
fundamental  approach.  For  a full  discussion  see 
Ref.  [1].  However,  the  return  period  method  is  par- 


ticularly easy  to  implement  for  type  I extreme  value 
probability  plots.  For  examples  of  these  plots  ap- 
plied to  pit  depths  in  steels  exposed  to  marine  envi- 
ronments see  Refs.  [6,7],  The  parameters  can  also 
be  regressed  on  covariates  as  appropriate,  to  allow 
for  dependence  on  measured  environment  vari- 
ables and/or  time,  see  for  example  Ref.  [8].  A more 
subtle  approach  for  modeling  covariates  would  use 
an  extreme  value  regression  model  of  the  sort  con- 
sidered in  the  context  of  the  Weibull  distribution 

[9]- 

In  Ref.  [10]  each  of  five  circular  coupons  were 
exposed  to  a corrosive  medium  for  each  of  four 
different  exposure  times:  1000  h,  3000  h,  5000  h, 
and  8000  h.  The  maximum  pit  depth  was  measured 
in  each  of  six  equal  sectors  on  each  specimen. 
Nominally  this  gave  120  pit  depths  in  all,  however, 
for  many  coupons,  pits  overlapped  into  a number 
of  sectors  and  so  the  number  of  independent  max- 
ima was  significantly  reduced.  Figure  3 shows  a 
plot  of  maximum  pit  depth  against  exposure  time 
for  resulting  data.  The  plotted  mean  function  and 
upper  bound  are  based  on  the  fitting  of  a 4- 
parameter  time  dependent  GEV  distribution  for 
which  fx,  = i/j,  = iJxp  and  £ is  constant.  This 
model  gives 

At,  = 0.91 2(  ± 0.063)/*  (A,  - 0.293(  ± 0.037)/* 

(3  -0.298(± 0.051)  £=  -0.216(  ±0.121). 


years 

Fig.  3.  Maximum  pit  depths  against  time  for  carbon  steel  in 
alkaline  conditions  along  with  fitted  mean  function  (-•-),  up- 
per bound  ( ) and  confidence  curves  for  the  upper  bound 

(---)■ 
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The  corresponding  mean  function  is  17,  = 
[9  1 - £)]f^  = rjtp,  which  agrees  with  the  com- 

mon assumption  made  in  the  corrosion  literature 
of  a power  law  growth  of  the  mean  maximum  pit 
depth  with  time  [8,11,12].  The  implied  upper 
bound  is  then  9,  = 9tp  = - ip/^t13.  Such  means 

and  bounds  can  be  extrapolated  out  to  larger  areas 
of  exposed  metal  and  to  longer  time  periods  using 
the  methods  described  in  Ref.  [1].  Standard  errors 
on  the  upper  bound  were  calculated  by  reparame- 
terizing the  problem  and  constructing  a profile 
likelihood  for  9,  as  in  Ref.  [2].  The  negative  value 
for  the  shape  parameter  £has  been  observed  by  the 
authors  of  this  paper  consistently  for  corrosion 
phenomena  of  many  types  and  in  many  environ- 
ments. This  has  important  consequences  for  ex- 
trapolation since,  in  corrosion  engineering  return 
levels  are  often  very  large  (e.g.,  it  may  only  be  pos- 
sible to  inspect  a small  number  of  one  meter  sec- 
tions of  a buried  pipeline  which  may  be  hundreds 
of  kilometers  in  length),  and  so  for  the  range  of 
values  of  £ encountered  by  the  authors,  the  maxi- 
mum will  be  very  close  to  the  upper  bound  or  end 
point  of  the  distribution.  This  should  be  contrasted 
with  the  commonly  used  £ = 0,  type  I extreme  value 
distribution,  [6-8,11]  for  which  there  is  no  upper 
bound. 


4.  Extreme  Order  Statistics 

There  is  a corresponding  asymptotic  result  con- 
cerning the  joint  distribution  of  the  r largest  values, 
Xmax=V(i)^ ...  ^X(r),  from  a sample  of  independent 
identically  distributed  random  variables.  Data  will 
in  general  then  consist  of  m sets  of  such  largest 
values.  The  joint  generalized  extreme  value  distri- 
bution (JGEV)  has  density 

/( xij2,..x,)  = i/r'exp{-[l  + ^U,  - n)]-1® 

-(|  + l)ilog[l+|(jr,-M)]},  (3) 

valid  for  gx ) > - ip  = £9,  ip  > 0(j  = 1 See  for 
example  Ref.  [13].  This  is  the  appropriate  distribu- 
tion to  use  when  the  r (say)  largest  values  are  ex- 
tracted from  coupons  or  sampled  areas,  instead  of 
just  the  single  largest  value.  This  provides  a useful 
extension  to  the  classical  theory  in  such  a way  as  to 
match  up  with  the  common  practice  of  measuring 


the  few  largest  pits  at  any  one  location  undergoing 
pitting.  Using  all  this  information  rather  than  just 
the  single  largest  extreme  enables  smaller  confi- 
dence bands  to  be  drawn  around  predicted  values. 
However  care  is  needed  to  ensure  that  r is  not 
taken  so  large  as  to  invalidate  the  choice  of  the 
asymptotic  distribution,  Eq.  (3). 

When  £ = 0,  this  model  reduces  to  the  Gumbel 
form  of  the  JGEV  with  density 

f{xu\2,..xr)  = iA_rexp{  - exp[ --(xr  - /*)] 

-xifc-M)}.  (4) 

7 = 1 IfJ 

A useful  diagnostic  here  is  the  joint  Gumbel  plot. 
When  x(i)^  ...^X(r)  have  density,  Eq.  (4), 
E (x(,))  = n-  — ip<fi(i)  (all  1 JS/^r)  [14],  where  <£(•)  is 
the  digamma  function.  Thus  a plot  of  the  order 
statistics  against  — </>(/)  will  give  a straight  line 
with  slope  ip  and  intercept  /x  if  the  Gumbel  form  of 
the  JGEV  distribution  is  appropriate.  Such  a plot 
is  shown  in  Fig.  4 for  each  of  the  pitted  college 
roofs  data  in  Table  1.  This  plot  indicates  that  these 
extremes  arise  from  perhaps  a mixture  of  two  tail 
distributions.  However  it  was  assumed  that  £ = 0 
for  both  roofs  and  that  for  roof  1,  the  two  largest 
values  were  to  be  outliers  from  the  model,  Eq.  (4). 
These  two  values  were  removed  for  the  purpose  of 
analysis,  and  the  slopes  and  intercepts  resulting 


Fig.  4.  Joint  Gumbel  plot  for  the  college  roof  data:  O — roof  1; 
□ — roof  2. 
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used  as  starting  values  for  determining  the  maxi- 
mum likelihood  estimates  of  the  parameters  in  Eq. 
(4).  The  fitted  values,  with  their  standard  errors, 
were 

/i=  54.2  (±7.9)  tA  = 12.5  (±2.1),  roof  1, 

fx  = 103.2  ( ± 15.8)  t A = 26.0  ( ± 4.2),  roof  2. 

These  values  are  then  available  for  the  implied 
Gumbel  distribution  of  the  maximum  value,  which 
has  mean  /x  +0.5772^.  This  gives  61.4  |xm  for  roof 
1 and  118.2  |xm  for  roof  2.  Extrapolation  could  now 
proceed  according  to  the  method  described  in  the 
previous  section,  noting  however  that  the  mean  of 
the  maximum  for  roof  1 is  considerably  out  of  line 
with  the  observed  maximum  of  131  pun. 

Reference  [15]  reports  on  an  experiment  where 
15  low  alloy  steel  specimens  were  suspended  in  a 
deionized  warm  water  bath  under  free  corrosion 
conditions.  Specimens  were  removed  at  varying  in- 
tervals up  to  71  days,  then  after  cleaning,  pit 
depths  and  diameters  were  measured  optically.  A 
4-parameter  JGEV  distribution  incorporating  a 
power  law  dependence  on  time  [16]  was  fitted  to 
these  pit-depths,  utilizing  the  two  largest  pits  from 
each  side  of  the  specimens  giving  parameter  values: 

H = 7.04 1 ( ± 0.710)^  (A  = 0.467(  ± 0.066)^, 

(3  =0.609(±  0.016)  €=  — 0.5 13(  ±0.126). 

These  are  the  maximum  likelihood  estimates  for 
their  data,  for  which  they  were  only,  at  that  time, 
able  to  report  initial  probability  weighted  moment 
and  regression  estimates.  Figure  5 shows  a plot  of 
this  data  along  with  the  fitted  mean  function  and 
upper  bound,  and  confidence  curves  for  the  upper 
bound  calculated  using  the  profile  likelihood 
method  discussed  in  the  previous  section. 

5.  Discussion 

A number  of  statistical  techniques  relating  to  ex- 
treme value  theory  have  been  described  and 
demonstrated  on  selected  sets  of  corrosion  data. 
Noting  that  much  corrosion  data  are  inherently  of 
an  extreme  nature,  purely  statistical  considerations 
along  the  lines  described  in  this  paper  may  be  the 
only  means  of  determining  numerical  values  for 
prediction  of  the  maximum  pit  depth  in  an  area  A 
at  time  t,  for  example,  along  with  some  estimate  of 


Fig.  5.  First  and  second  largest  pit  depths  against  time  for  low 
alloy  steel  in  deionized  warm  water,  along  with  fitted  mean 

function  ( ),  upper  bound  ( ) and  confidence  curves  for 

the  upper  bound  ( ). 

precision  or  possible  error.  There  is  much  evidence 
in  the  literature  that  £ < 0 for  the  GEV  distribution 
in  the  context  of  extremes  of  corrosion  phenomena. 
Return  levels  are  often  very  large  and  so,  for  the 
range  of  values  of  £ encountered,  predicted  max- 
ima will  often  be  very  close  to  the  implied  upper 
bound  or  end  point  of  the  distribution. 

It  should  be  noted  however,  that  with  all  the 
methods  described  here,  there  are  pitfalls.  When 
modeling  exceedances,  for  example,  it  is  difficult  to 
choose  the  threshold  objectively,  and  different 
thresholds  can  lead  to  different  predictions.  Similar 
problems  exist  in  the  use  of  the  r largest  order 
statistics  and  also  the  maximum  itself.  How  many 
largest  order  statistics  should  be  used?  When 
recording  single  maxima,  how  large  should  the 
sampled  area  be?  While  some  theoretical  results 
are  available  to  answer  such  questions  (e.g.,  Ref. 
[17])  these  are  not  very  helpful  in  a practical  con- 
text. 
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